Skip to content

Add prune flag to PermitScrubber #128

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 29, 2022

Conversation

seyerian
Copy link
Contributor

@seyerian seyerian commented Apr 9, 2022

By default PermitScrubber leaves behind inner content when scrubbing a
node. Setting the prune flag will remove the node and its content.

Will document in README if this is acceptable.

@flavorjones
Copy link
Member

@seyerian Thanks for submitting this PR! I'll take a closer look this weekend.

@daviluis321
Copy link

Hi, I'm needing this change to remove the content beyond the tag. Can I help with something in MR, with review or coding?

@flavorjones
Copy link
Member

Thanks for the nudge, I'll make time for this in the next few days.

@flavorjones flavorjones force-pushed the allow-permit-scrubber-to-prune branch from b7f819e to 096fd00 Compare May 27, 2022 21:39
@flavorjones
Copy link
Member

Rebased off current master

@flavorjones
Copy link
Member

A couple of quick thoughts on this changeset:

scrub direction

Because pruning removes all the children of an unsafe node, it's more efficient to go top-down rather than bottom-up. See https://github.com/flavorjones/loofah/blob/main/lib/loofah/scrubbers.rb#L122-L132 for Loofah's implementation

new classes rather than conditional branches

Rather than introduce branching on the value of prune, I would prefer if we had a new scrubber class (which could subclass PermitScrubber!) to do this.

I don't like the use of a class attribute in SafeListSanitizer for behavior like this that may vary from use case to use case (e.g., one controller may want to scrub, another may want to prune). (Note that the allowed attributes and tags are generally set by security policy and so maybe these are OK to be in class attributes (though I don't like them very much there, either).) We could make this a constructor value to simplify things.

So the changeset could simplify to look something like:

diff --git a/lib/rails/html/sanitizer.rb b/lib/rails/html/sanitizer.rb
index 5633ca1..97cf7e5 100644
--- a/lib/rails/html/sanitizer.rb
+++ b/lib/rails/html/sanitizer.rb
@@ -110,8 +110,8 @@ class << self
         acronym a img blockquote del ins))
       self.allowed_attributes = Set.new(%w(href src width height alt cite datetime title class name xml:lang abbr))
 
-      def initialize
-        @permit_scrubber = PermitScrubber.new
+      def initialize(prune: false)
+        @permit_scrubber = prune ? PermitPruner.new : PermitScrubber.new
       end
 
       def sanitize(html, options = {})
diff --git a/lib/rails/html/scrubbers.rb b/lib/rails/html/scrubbers.rb
index 09cfe95..5fd59af 100644
--- a/lib/rails/html/scrubbers.rb
+++ b/lib/rails/html/scrubbers.rb
@@ -158,6 +158,19 @@ def scrub_attribute(node, attr_node)
       end
     end
 
+    class PermitPruner < PermitScrubber
+      def initialize
+        super
+        @direction = :bottom_up
+      end
+      
+      private
+
+      def scrub_node(node)
+        node.remove
+      end
+    end
+
     # === Rails::Html::TargetScrubber
     #
     # Where +Rails::Html::PermitScrubber+ picks out tags and attributes to permit in
diff --git a/test/sanitizer_test.rb b/test/sanitizer_test.rb
index 8b0b7ab..92c362c 100644
--- a/test/sanitizer_test.rb
+++ b/test/sanitizer_test.rb
@@ -258,6 +258,11 @@ def test_custom_attributes_overrides_allowed_attributes
     end
   end
 
+  def test_setting_default_prune_affects_sanitization
+    input = '<u>leave me <b>now</b> please</u>'
+    assert_equal '<u>leave me  please</u>', safe_list_prune(input, tags: %w(u))
+  end
+ 
   def test_should_allow_custom_tags
     text = "<u>foo</u>"
     assert_equal text, safe_list_sanitize(text, tags: %w(u))
@@ -601,6 +606,10 @@ def safe_list_sanitize(input, options = {})
     Rails::Html::SafeListSanitizer.new.sanitize(input, options)
   end
 
+  def safe_list_prune(input, options = {})
+    Rails::Html::SafeListSanitizer.new(prune: true).sanitize(input, options)
+  end
+
   def assert_sanitized(input, expected = nil)
     if input
       assert_dom_equal expected || input, safe_list_sanitize(input)
diff --git a/test/scrubbers_test.rb b/test/scrubbers_test.rb
index a825404..06a6c65 100644
--- a/test/scrubbers_test.rb
+++ b/test/scrubbers_test.rb
@@ -4,8 +4,8 @@
 class ScrubberTest < Minitest::Test
   protected
 
-    def assert_scrubbed(html, expected = html)
-      output = Loofah.scrub_fragment(html, @scrubber).to_s
+    def assert_scrubbed(html, expected = html, scrubber: @scrubber)
+      output = Loofah.scrub_fragment(html, scrubber).to_s
       assert_equal expected, output
     end
 
@@ -66,6 +66,13 @@ def test_leaves_only_supplied_tags
     assert_scrubbed html, '<tag>leave me now</tag>'
   end
 
+  def test_prunes_tags
+    html = '<tag>leave me <span>now</span> please</tag>'
+    scrubber = Rails::Html::PermitPruner.new
+    scrubber.tags = %w(tag)
+    assert_scrubbed html, '<tag>leave me  please</tag>', scrubber: scrubber
+  end
+
   def test_leaves_comments_when_supplied_as_tag
     @scrubber.tags = %w(div comment)
     assert_scrubbed('<div>one</div><!-- two --><span>three</span>',

WDYT?

@seyerian
Copy link
Contributor Author

Hey! Thanks for looking at this.

Because pruning removes all the children of an unsafe node, it's more efficient to go top-down rather than bottom-up.

Ah, yes, that makes sense.

I don't like the use of a class attribute in SafeListSanitizer for behavior like this that may vary from use case to use case (e.g., one controller may want to scrub, another may want to prune). (Note that the allowed attributes and tags are generally set by security policy and so maybe these are OK to be in class attributes (though I don't like them very much there, either).) We could make this a constructor value to simplify things.

Agreed. I was simply following the pattern set by tags/attributes, but it's not necessary.

Rather than introduce branching on the value of prune, I would prefer if we had a new scrubber class (which could subclass PermitScrubber!) to do this.

I don't have a strong opinion on this, but I'll offer two thoughts:

  1. If the prune option is kept in PermitScrubber, and SafeListSanitizer passes down the option rather than selecting a different class,
class SafeListSanitizer < Sanitizer
  def initialize(prune: false)
    @permit_scrubber = PermitScrubber.new(prune: prune)
  end

then this allows for TargetScrubber, itself a subclass of PermitScrubber, to do pruning as well. So e.g. one can target figure elements and fully prune them and their contents from a document.

  1. As someone who is not super familiar with this code, I find the name of the subclass you implemented confusing. PermitPruner vs PermitScrubber suggests that one is pruning and one is scrubbing, but actually they are both scrubbing; the first is pruning and the second is stripping. (That is my understanding from the Loofah documentation.) Apologies if that seems pedantic; just offering my perspective as a user.

@seyerian
Copy link
Contributor Author

@flavorjones By your changeset it looks like you are already a fair way to your desired implementation. Feel free to take it from here, or let me know what else I can do to help.

@flavorjones
Copy link
Member

flavorjones commented May 28, 2022

As someone who is not super familiar with this code, I find the name of the subclass you implemented confusing

You are totally right, I didn't pick a good name! I considered PermitScrubber::Pruner and PermitPruneScrubber and didn't like either :shrug: :rofl:

If the prune option is kept in PermitScrubber, and SafeListSanitizer passes down the option rather than selecting a different class, then this allows for TargetScrubber, itself a subclass of PermitScrubber, to do pruning as well

Ah, good point! That would be neat to support. Would you be willing to tweak this PR to have SafeListSanitizer's prune option be an instance variable (default to false but settable via an initializer parameter) instead of a class attribute? And add a test for the TargetScrubber as well?

@seyerian
Copy link
Contributor Author

Pushed two commits making these changes and documenting in README:

  • Made prune option an instance variable, defaulting to false
  • PermitScrubber's direction flips to :top_down if pruning
  • Fixed tests accordingly and tested TargetScrubber

Let me know if there's anything else to be done or if I misread something! And again feel free to tweak it yourself if needed.

@flavorjones
Copy link
Member

Thank you! Will take a look this weekend.

@flavorjones
Copy link
Member

This is a great PR! Thank you so much! I'm going to squash these into a single commit and merge. I'll ship it in a release in a day or so and credit you in the changelog!

By default PermitScrubber leaves behind inner content when scrubbing a
node. Setting the `prune` flag will remove the node *and* its content.

Note that this feature also works with TargetScrubber.
@flavorjones flavorjones force-pushed the allow-permit-scrubber-to-prune branch from 68a5fb3 to b19975f Compare May 29, 2022 14:54
[skip ci]
@flavorjones flavorjones merged commit cb42074 into rails:master May 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants